Ohio GIS Conference Workshop
September 22, 2020
Adam Porr
Consulting Manager
Center for Urban and Regional Analysis
When it comes to developing geoprocessing pipelines, ArcGIS ModelBuilder and Python are a combination that can't be beat! In this workshop we will describe how to build a geoprocessing pipeline that leverages the flexibility of Python and its panoply of packages while retaining the simple and familiar ModelBuilder user interface. As an example, we'll walk through a hybrid model/script that downloads U.S. Census data via the Census API, performs some simple manipulations of the data, joins it to geography polygons (also automatically downloaded from the Census website), creates a map, and publishes via ArcGIS Online. Attendees will learn about the complementary strengths and weaknesses of Python and ModelBuilder, how to use Python to fetch U.S. Census data and prepare it for use in ArcGIS, and how to incorporate Python code in a ModelBuilder model as a script tool. Attendees should have some familiarity with ModelBuilder. Familiarity with Python is helpful, but not required.
(Ask questions anytime via the chat!)
All of the content presented today is publicly available in GitLab:
https://gitlab.com/osucura/modelbuilder-and-python-two-peas-in-a-pod
The slides are available directly from the following URL:
https://osucura.gitlab.io/modelbuilder-and-python-two-peas-in-a-pod
Yes, sometimes, efficiency for recurring tasks!
But also:
Create an interactive webmap showing poverty rates for Ohio zipcodes and census tracts. Prepare a workflow for annual updates that requires minimal effort, skill, and licensing cost.
# Default path for all files created by script (can be overridden by input parameter)
DEFAULT_OUTPUT_FOLDER = os.path.normpath("./data")
# List of census variables to download:
# NAME (Description of geography)
# GEO_ID (Unique identifier for the geography)
# S1701_C01_001E (Total population) for whom poverty status is determined
# S1701_C02_001E (Below poverty level)
# S1701_C03_001E (Percent below poverty level)
ACS_VARIABLES = ["NAME","GEO_ID","S0101_C01_001E","S1701_C02_001E","S1701_C03_001E"]
COLUMNS = ["NAME","GEO_ID","TOTAL","POVERTY","POVERTY_PCT"]
try:
year = os.path.normpath(sys.argv[1])
except IndexError:
print("Data year is undefined.")
sys.exit(-1)
try:
outputFolder = os.path.normpath(sys.argv[2])
except IndexError:
outputFolder = ""
boundariesSeq = ["zipcodes", "tracts", "states", "counties"]
boundariesUrl = {
"zipcodes":"https://www2.census.gov/geo/tiger/GENZ{0}/shp/cb_{0}_us_zcta510_500k.zip".format(year),
"tracts": "https://www2.census.gov/geo/tiger/GENZ{0}/shp/cb_{0}_39_tract_500k.zip".format(year),
"states": "https://www2.census.gov/geo/tiger/GENZ{0}/shp/cb_{0}_us_state_500k.zip".format(year),
"counties": "https://www2.census.gov/geo/tiger/GENZ{0}/shp/cb_{0}_us_county_500k.zip".format(year)
}
dataSeq = ["zipcodes", "tracts"]
dataUrl = {
"zipcodes": "https://api.census.gov/data/{0}/acs/acs5/subject?get={1}&for=zip%20code%20tabulation%20area:*".format(year, ",".join(ACS_VARIABLES)),
"tracts": "https://api.census.gov/data/{0}/acs/acs5/subject?get={1}&for=tract:*&in=state:39".format(year, ",".join(ACS_VARIABLES))
}
for dataset in boundariesSeq:
saveFile = os.path.join(outputFolder, year, dataset + "_shp.zip")
if (os.path.exists(saveFile)):
print(dataset.capitalize() + " shapefile already exists for this year. Skipping download.")
else:
print("Downloading shapefile for " + dataset + ". (This might take a while)")
try:
urllib.urlretrieve(boundariesUrl[dataset], saveFile)
except Exception as e:
print(e)
print("Failed to download shapefile for " + dataset)
sys.exit(-1)
zipOutputFolder = os.path.join(outputFolder, year, dataset + "_shp")
if (os.path.exists(zipOutputFolder)):
print("Deleting previously extracted data for " + dataset)
try:
shutil.rmtree(zipOutputFolder)
except:
print("Failed to remove zip folder for " + dataset)
sys.exit(-1)
# OMITTED: Unzip data to output folder
for dataset in dataSeq:
saveFile = os.path.join(outputFolder, year, dataset + "_data.json")
if (os.path.exists(saveFile)):
print(dataset.capitalize() + " data already exists for this year. Deleting it.")
try:
os.unlink(saveFile)
except:
print("Failed to delete existing data for " + dataset)
sys.exit(-1)
print("Downloading data for " + dataset + ". (This might take a while)")
try:
urllib.urlretrieve(dataUrl[dataset], saveFile)
except:
print("Failed to download data for " + dataset)
sys.exit(-1)
print("Converting data for " + dataset + " to CSV")
try:
with open(saveFile, "r") as f:
dataStr = f.read()
dataObj = json.loads(dataStr)
df = pd.DataFrame(data=dataObj[1:], columns=dataObj[0])
df.set_index("GEO_ID", inplace=True)
df.to_csv(saveFile.replace(".json", ".csv"))
except:
print("Failed to convert data for " + dataset)
sys.exit(-1)
C:\Python27\ArcGIS10.7\python getCensusData.py <YEAR>
year = os.path.normpath(sys.argv[1])
year = arcpy.GetParameterAsText(0).strip()
Note: sys.argv[0] is the script filename
arcpy.SetParameterAsText(3, saveFile.replace(".json", ".csv"))
Note: A simpler version might read as follows:
arcpy.SetParameterAsText(3, "outputFile.csv")
print("Downloading data for " + dataset)
arcpy.AddMessage("Downloading data for " + dataset)
arcpy.AddWarning("You probably don't want to do that.")
arcpy.AddError("Now look what you've done!")
Purpose: Creates a file geodatabase to store the output data
Purpose: Joins attributes to spatial data and prepares the geometry and attributes for proper display on a webmap.
Note: This diagram shows a simplified flow for the zipcodes layer. The flow for the tracts layer is similar. The model also prepares a layer for counties and one for the state as a whole.
Purpose: Classifies the tracts/zipcodes into a set of 21 quantiles (median, plus deciles above and deciles below the median)
Note: This is accomplished using pandas, an open source data science library for Python. The class assignments are stored in the attribute table. This allows for consistent analysis and visualization using a variety of platforms.
Purpose: Prepares a complete ArcGIS Map document comprised of three layers (counties, zipcodes, and tracts), properly symbolized and including all required metadata.
Note: The arcpy.mapping module is used to assemble the map. This relies on an existing template map document and an existing layer file for each layer.
The web map, depicted below, is publicly available on ArcGIS Online.
Note: The web map itself was created manually via the ArcGIS Online web interface, but displays the layers published using the pipeline described herein. Each time the pipeline is run, the published layers are overwritten and the changes are depicted on the map.